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Abstract 


The Arnoldi process is a well known technique for approximating a 
few eigenvalues and corresponding eigenvectors of a general square 
matrix. Numerical difficulties such as loss of orthogonality and as- 
sessment of the numerical quality of the approximations as well as a 
potential for unbounded growth in storage have limited the applica- 
bility of the method. These issues are addressed by fixing the number 
of steps in the Arnoldi process at a prescribed value k and then treat- 
ing the residual vector as a function of the initial Arnoldi vector. This 
starting vector is then updated through an iterative scheme that is 
designed to force convergence of the residual to zero. The iterative 
scheme is shown to be a truncation of the standard implicitly shifted 
QR-iteration for dense problems and it avoids the need to explicitly 
restart the Arnoldi sequence. The main emphasis of this paper is on 
the derivation and analysis of this scheme. However, there are obvi- 
ous ways to exploit parallelism through the matrix- vector operations 
that comprise the majority of the work in the algorithm. Preliminary 
computational results are given for a few problems on some parallel 
and vector computers. 
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1 Introduction 


Large scale eigenvalue problems arise in a variety of settings. Often these 
very large problems arise through the discretization of a linear differential 
operator in an attempt to approximate some of the spectral properties of the 
operator. However, there are a considerable number of sources other than 
PDE. Saad gives a number of examples in [28], 

If one hopes to solve extremely large algebraic eigenvalue problems it is 
not possible to rely upon the proven methods for dense matrices such as the 
Q-R iteration due to the expense of storage requirements and arithmetic cost 
of an iteration. Fortunately, it is common to be interested only in a selected 
subset of the spectrum of a large matrix. In the symmetric setting one is 
typically interested in the extremes of the spectrum (i.e. a few of the largest 
or smallest eigenvalues). In the non-symmetric setting one is often concerned 
with determining eigenvalues with largest real part. 

The Lanczos method [19] is a popular algorithm for solving large symmet- 
ric eigenvalue problems . The Arnoldi process [1] is a generalization of the 
Lanczos method which is appropriate for finding the eigenvalues of a large 
non-symmetric matrix. These methods only require one to compute action of 
the matrix on a vector through a matrix vector product. Often this may be 
accomplished without explicit storage of the matrix and this property along 
with a number of theoretical and computational features have contributed to 
the widespread appeal of these methods. However, both of these share some 
inherent numerical difficulties which have been the subject of considerable 
research over the last two decades [ 8, 16, 25, 27]. 
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In this paper these methods will be discussed from a new perspective. 
The goal is to address the non-symmetric problem and thus the focus is on 
the Arnoldi algorithm. However, since the Arnoldi method reduces to the 
Lanczos method when the matrix is symmetric, everything that is devel- 
oped here is applicable to the symmetric case as well with obvious savings in 
computational effort available through the exploitation of symmetry. Tradi- 
tionally, the point of view has been to let the Arnoldi or the Lanczos sequence 
develop without bound while monitoring error estimates associated with the 
Ritz vectors to identify converged eigenvalues. However, if one explores the 
relation with the QR-iteration it is apparent that the Arnoldi (Lanczos) 
method is really a truncated reduction of the given matrix into upper Hes- 
senberg (tridiagonal) form. The iterative phase of the QR-method does not 
have an analogy within the traditional treatment of these algorithms. 

A variant of the Arnoldi method which includes such an iterative phase 
is developed here by analogy to the well-known implicitly shifted Q-R iter- 
ation [ 14, 33, 35] for dense matrices. Such an analogy may be developed if 
one treats the residual vector as a function of the initial Arnoldi (Lanczos) 
vector, and then attempts to iteratively improve this vector in a way to force 
the residual vector to zero. As shown here, this may be done by implicit 
application of a polynomial filter to the starting vector on each iteration. 
The implicit application of this polynomial filter is accomplished through a 
truncated version of the implicitly shifted Q-R iteration. Within this con- 
text, an updating scheme is developed which preserves an Arnoldi (Lanczos) 
factorization of predetermined size. The method generalizes explicit restart 
methods and it is possible to implement a mathematically equivalent im- 
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plicit method corresponding to all of the explicitly restarted methods that 
this author is aware of (See Section 5). 

The idea of iteratively forcing the residual to zero is not new. Variants of 
this idea were introduced early by Karush in [18]. Cullum and her colleagues 
have investigated explicit restart methods for the symmetric case [5, 6, 8], 
Most recently the idea has been explored by Saad in [28,29] by Chatelin 
and Ho in [2] and by Chronopoulos in [3] for the nonsymmetric case. All 
of these techniques use eigensystem information from the projected matrix 
to construct an updated starting vector for the Arnold! (Lanczos) process, 
and then restart this process from scratch. Here, a computational framework 
is developed which updates the Arnold! factorization instead of re-starting 
it. As just mentioned, this update procedure is completely analogous to the 
implicitly shifted QR-iteration. It is shown here that the update procedure 
will implicitly apply linear polynomial factors to the starting vector in a 
manner that will purge the starting vector of unwanted components. In this 
way invariant subspaces of predetermined dimension might be found. 

This approach has several advantages over more traditional approaches. 
The number of eigenvalues that are sought is prespecified. This fixes the 
storage and computational requirements instead of allowing them to become 
arbitrarily large. It is expected that the number of eigenvalues that are 
sought will be modest, and in this situation, orthogonality of the Arnold! 
(Lanczos) basis for the Krylov subspace can be maintained. Therefore, the 
questions of spurious eigenvalues and selective re-orthogonalization do not 
enter. Finally, the well understood deflation rules associated with the QR 
iteration may be carried over directly to the technique. 
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2 The Arnoldi Factorization 


The Arnoldi factorization may be viewed as a truncated reduction of an n x n 
matrix A to upper Hessenberg form. After k steps of the factorization one 
has 

(2.1) AV = VH + re T k 

where V 6 R nX *, V T V = /*; H € R* x * is upper Hessenberg, r G R n with 
0 = V T r. An alternative way to write (2.1) is 


( 


(2.2) AV = {V,v) 


H 


\ @ e l 


where = ||r|| and v — — r . 


From this representation, it is apparent that (2.2) is just a truncation of the 
complete reduction 


(2.3) 


A(V,V) = (V, V) 


H M \ 
Pe,el H J 


where (V, V) is an orthogonal n x n matrix and H is an upper Hessenberg 
matrix of order n — k. Equation (2.2) and hence (2.1) may be derived from 
(2.3) by equating the first k columns of both sides and setting v = Ve\. 

The factorization (2.1) may be advanced one step through the following 
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recursion formulas: 


(2.3.1) /? = ||r|| ; v = jr ; 

(2.3.2) V + = (V»; 

(2.3.3) w = Av ; ^ * ) = V + T u; ; 

(2.3.4) ff + = ( h a ) i 

(2.3.5) r,=»-V t N=(J- V'+V'+V • 

From this development it is easily seen that 

AV + = V + H + + r + e T k+l , V+V+ = J fc+1 , Vf r + = 0 . 

In a certain sense, computation of the projection indicated at Step (2.3.5) has 
been the main source of research activity in this topic. The computational 
difficulty stems from the fact that ||r|| = 0 if and only if the columns of V 
span an invariant subspace of A. When V “nearly” spans such a subspace 
||r|| will be small. Typically, in this situation, a loss of significant digits will 
take place at Step (2.3.5) through numerical cancellation unless special care 
is taken. On the one hand, it is a delightful situation when ||r|[ becomes small 
because this indicates that the eigenvalues of H are accurate approximations 
to the eigenvalues of A. On the other hand, this “convergence” will indicate 
a probable loss of numerical orthogonality in V . The identification of this 
phenomenon and the first rigorous numerical treatment is due to Paige[22,23j. 
There have been several approaches to overcome this problem : 
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(1) Complete Re-orthogonalization. 

This may be accomplished through maintaining V in product House- 
holder form [15, 34]. It may also be accomplished through the Modified 
Gram- Schmidt processes with re-orthogonalization [9, 26], More will be 
said of these alternatives later. 

(2) Selective Re-orthogonalization. 

This option has been proposed by Parlett and has been heavily re- 
searched by him and his students. Most notably, the thesis and sub- 
sequent papers and computer codes of Scott have developed this idea 
[24, 25, 31]. The general scheme is described in [25]. 

(3) No Re-orthogonalization. 

This last option introduces the almost certain possibility of introducing 
spurious eigenvalues. Various techniques have been developed to detect 
the presence of spurious eigenvalues [7, 8]. However, they do not appear 
when even a modest level of linear independence has been imposed on 
the Arnoldi vectors. 

Computational cost has been cited as the reason for not employing com- 
plete orthogonalization of the Arnoldi (or Lanczos) vectors. However, the 
cost will be reasonable if one is able to fix k at a modest size and then up- 
date the starting vector Uj = V e\ while repeatedly doing k- Arnoldi steps. 
This approach has been explored to some extent in [2, 28]. In the symmetric 
case Cullum [6] relates a variant of this approach (which has been termed 
an s-Step method) to applying a fixed number of conjugate gradient steps to 
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a minimize (maximize) ( VV T , A) where ( B , A) = trace(B T A) is the Frobe- 
nius product functional with V restricted to the generalized block Krylov 
subspace. However, while this argument gives considerable credence to the 
restart procedure, it does not establish convergence. 

Throughout the remainder of this paper, the fc-step approach will be de- 
veloped from a different point of view. An attempt will be made to iteratively 
update V\ in order to force the residual vector r(ui) to zero. In order to make 
sense of this it will be necessary to understand when r is indeed a function 
of Vi and also to determine its functional form and characterize the zeros of 
this function. 

The classic simple result that explains when r is a function of Vi is the 
Implicit Q- Theorem. 

Theorem 2.4 Suppose 

AV = VH + rel 
AQ = QG + fe T k 

where Q, V have orthonormal columns and G , H are both upper Hessenberg 
with positive subdiagonal elements. 

If Qe i = Ve x and Q T f = V T r = 0, then Q = V , G = H , and f = r. 

Proof: There is a straightforward inductive proof (or see [16,p367]). □ 

Of course the Krylov space 

)Ck(A,vi) = Span {i>j, Ayi, A 2 vi , . . . , A fc-1 Ui} 
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plays an important role along with the Krylov matrix 

K = (vi,Avi,...,A k ~ 1 v 1 ) . 

An alternate derivation of the Arnoldi process is to consider the companion 
(or Frobenius) matrix 

70 

71 

1 Ik-i 

and to observe that 

(2.5) AI< - I<F = re T k 

where r = A k v 1 - Kg with g T = ( r 0 ,g T ). Note that r = p{A)v x where 

p( A) = A* + Ej=o7j^> and also that P(^) is the characteristic polynomial 

of F . If g is chosen to solve min then r is orthogonal to all 

vectors in KJA, t'A Moreover p solves min {||p(A)ui||) where VMk is the 

peVMk 

set of all monic polynomials of degree k. 

To solve the minimization problem in (2.5), one would factor A = QR 
where Q is orthogonal, R is upper triangular. Note that R is nonsingu- 
lar if and only if I< has linearly independent columns and that Q may be 
constructed so that pjj = ej Rej > 0. One then solves 

g = R~ 1 Q T A k v 1 . 

This choice of g will minimize the residual and also will assure that 0 = 
Q T r. Multiplying (2.5) on the right by R~ l gives 

A(KR~ l ) - (KR-')RFR- 1 = re T k R~ l , 
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i.e. 


(2.6) AQ -QG = fel 

where Q = K R' 1 , G = RFR~ l is upper Hessenberg with the same character- 
istic polynomial as F, and / = j^r. It is easily verified that v t = Qe x = V e x 
, and 0 = Q T f . Thus, the Implicit Q-Theorem will imply that Q = V, 
G = H, and / = r. Putting H = G yields 


Moreover, 


gives 


fij — e-^Hej — ej +l RFR < 


Pj+U+i 

PA 


T-llfeMMI = ft = 

Pa pa 

Pi+ ld+i = II oil “ lift MM II • 


This discussion establishes the following. 


Theorem 2.7 Let AVj = VjHj + rjeJ be a sequence of successive Arnoldi 
steps 1 < j < k and suppose that dim(X)jt(A, Ui)) = k. Then 


( 1 ) 


O = 


■ ft MM , Pj = 


lift MM II 


PH(%ir jv " ri ’ lift- 1 mm ii 

where pj(\) is the characteristic polynomial of Hj. Moreover, 


(2) ft solves p min^{||p(v4)u 1 ||} 

for 1 < j < k. 

The development leading to Theorem (2.7) follows and builds upon the 
development by Ruhe in [27], The fact that ||ftMM|| (the characteristic 
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polynomial of H k acting on v l ) will minimize ||p(i4)w 1 || over all monic poly- 
nomials of degree k was proved by Saad in [29]. Theorem (2.7) points out 
that this minimum principle is is not useful in assessing the behavior of the 
residual obtained through the Arnoldi process. 

Since r k = 0 if and only if Uj is in the null space of some fc-degree polyno- 
mial in A, it is likely that r k = 0 if and only if v k is the sum of ^-eigenvectors 
of A. We establish this result in the following. 

Theorem 2.8 Let AV k - V k H k = r k e T k be a k-step Arnoldi factorization of 
A, with Tj ± 0 , 0 < j < k - 1 . Then r k = 0 and H k is diagonalizable 
if and only if v j = where {x,} is a set of k linearly independent 

eigenvectors for A . 

Proof: Suppose i?i = Xj, where {xj} &re a set of linearly independent 
eigenvectors for A . Then 

K k+ i(A,Vi) = Span {u a , Av u . . . , A k vi} C Span ( { x _,} ) 


and since r^_i ^ 0 =>• p k ~i{A)vi ^ 0 

.. 1 


F* = 


rl|Pfc(^) u ill = 0 


\\Pk-i(A)vi\\ 

must hold because the k -\- 1 vectors {uj , Av %, . . . , lie in a A;-dimensional 

subspace. Moreover, rj ^ 0 for 0 < j < k — 1 implies dim/Cfc(A,t>i) = k and 
since JC k {A, v t ) C Span {xj} these two subspaces must be identical. Thus the 
columns of V k form a basis for Span {xj} and hence x 2 = V k y } for 1 < j < k. 
It follows easily now from the Arnoldi factorization that { yj } is a set of k 
linearly independent eigenvectors for H. 
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Suppose now that r k = 0 and H k is diagonalizable. Then 

AV k yj = VH k yj = \jV k yj 

for every eigenpair (j/j,Aj) of H k . Since H k is diagonalizable, {yy } hence 
X = {xj : Xj = V k yj} is a linearly independent set of eigenvectors for A. The 
set of vectors x must be a basis for K, k {A,vi ) so v\ = £* =1 ®j x j- If an y = 0 
for 1 < j < A:, then would be the sum of fewer than k eigenvectors so Tj 
would vanish for some 1 < j < k — 1 by the first part of the proof. □ 

The presence of non-trivial Jordan blocks in H can be dealt with by 
introducing generalized eigenvectors. There is an analogous statement and 
proof but the details are tedious. 

Now that the nature of the residual has been exposed and now that a 
criterion for this residual to vanish has been set forth it is possible to devise 
algorithms to accomplish this goal. The point of view that shall be taken for 
the derivation of these algorithms has considerable analogy with the standard 
Qi2-iteration. In the next section this iteration is discussed in a framework 
that will aid in the derivation of the new algorithms. 

3 Relation to the QR Algorithm 

In order to motivate the point of view put forth in the remainder of this 
paper, it will be instructive to derive and analyze the QR iteration from a 
certain point of view. 

To do this, suppose that there has been a complete reduction of A to 
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upper Hessenberg form. Thus 


(3.1) AV -VH = 0 , V T V = /„ , //-upper Hessenberg. 

The explicitly shifted QR algorithm consists of the following four steps. Let 
p be the shift and let ( H — pi) = QR with Q orthogonal and R upper 
triangular. Then 

(3.1.1) (A - pI)V - V{H - pi) = 0 

(3.1.2) {A-fiI)V -VQR = 0 

(3.1.3) (A- f iI)(VQ)-(VQ)(RQ) = 0 

(3.1.4) A(VQ) - (VQ)(RQ + fil) = 0 

After these four steps we have updated (3.1) to produce 

(3.2) AV + -V + R+=0 

where V+ = VQ, and H + = RQ + /// is upper Hessenberg. Note that from 

(3.1.2) and (3.2) it follows that 

(A - nl)v i = vfpu 

where p u = ej Re u vf = V+e x . Moreover, from (3.1.3) 

(VQ)~ l (A - pi)- 1 - (RQr^VQ)- 1 = 0 . 

Hence 

Q t V t (A - pi)- 1 - Q T R~ x Vj = 0 , 
i.e. 

V T (A-pI)~ 1 -R- x Vl = 0 
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so that 

(A M-0 — Pnn^n 

where v n = V e n , i>+ = V^e*. This proves the well known that the QR 
iteration is performing inverse iteration [33] with respect to A T on the last 
column of V. 

An implicitly shifted QR step starting with (3.1) consists of 

(3.3) A(VQ) - (VQ)(Q t HQ) = 0 

where the orthogonal matrix Q is computed as a product of Givens transfor- 
mations which are specified implicitly through the well known U bulge chase” 
sequence as described in [25, pi 59, 33] once the shift fi is specified. From 
the previous discussion, the application of p implicit shifts will result in the 
implicit application of a polynomial xfi of degree p to the vector v x . Thus 
once the p shifts have been applied 

(3.4) AV+ - V+H+ = 0 

where V+ = VQ 1 Q 2 • • • Q p , H + = • • • QjQ'[HQ 1 Q 2 ■ ■ ■ Q v with v? = V + ej 

satisfying 

vf = 0(A)v, 

where 0(A) = \ FIy=i — Pj) with T a normalizing factor to make ||t>jf"|| = 1 
and {fij} the set of p implicit shifts. 

From this point of view, one may interpret the QR iteration as a process of 
rapidly determining an approximate root /x of the characteristic polynomial 
and then applying the linear factor A— pi to v\ to replace it with vf «— “(A — 
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Hl)vi in order to purge the starting vector of components along eigenvectors 
associated with p. As the iteration proceeds, subdiagonal elements of H 
must tend to zero according to Theorem (2.8). 

In the next section, the mechanics of applying an implicit shift to an 
Arnoldi factorization will be developed. Eventually, the goal will be to choose 
shifts in a way that will damp or filter out the components of unwanted 
eigenvectors in the starting vector Vi . In the large scale setting, it is not 
viable to apply an implicit shift corresponding to each unwanted eigenvalue. 
Therefore, polynomial filtering techniques will have to be utilized. 

4 Updating the Arnoldi Factorization 

In this section a direct analogue of the QR iteration will be derived. This 
will lead to an updating formula that can be used to implement an iterative 
technique for forcing the residual r k to zero by iteratively forcing v ! into a 
subspace spanned by k eigenvectors. 

Throughout this discussion, the integer k should be thought of as a fixed 
pre-specified integer of modest size. Let p be another positive integer, and 
consider the result of k + p steps of the Arnoldi process applied to A which 
has resulted in the construction of an orthogonal matrix V k+p such that 

(4.1) AV k +p = V k + p Hk+p + r k+p e k+p 

= (Vk+pi v k+p+ 1 ) 

An analogy of the explicitly shifted QR algorithm may be applied to this 
truncated factorization of A. It consists of the following four steps. Again, 


Hk+p \ 
Pk+p^k+p ) 
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let p be the shift and let ( H - pi) = QR with Q orthogonal and R upper 
triangular. Then (putting V = V k+P , H = H k+P ) 

(4.1.1) (A-pI)V-V(H-pI) = r k+p el+ p 

(4.1.2) (.4 - pI)V - V QR = r k+p ej +p 

(4.1.3) (A - pI)(VQ) - (VQ)(RQ) = r k+p e T k+p Q 

(4.1.4) A(VQ) - ( VQ)(RQ + til) = r k+p e T k+p Q 

Note that just as in (3.1.1) - (3.1.4), if one takes V+ = VQ and H + = RQ+fil, 
then H+ is upper Hessenberg and 

(A - til)v i = Uj + Pw 

where p n = ej Re u vf = V+e x just as before so long as eJ +p Qe i = 0. Since 
Q is an upper Hessenberg matrix of order k+p . This idea may be extended 
for up to p shifts being applied successively. The development will continue 
using the implicit shift strategy. 

The application of a QR iteration corresponding to an implicit shift p 
produces an upper Hessenberg orthogonal Q € R fc+P such that 

An application of p implicit shifts therefore results in 

(4,) 

where Vj.+ p = V k+P Q, H k+P — Q T H k +pQ, aR d Q = QiQi ■ ■ ■ Q P , with Q } the 
orthogonal matrix associated with the shift pj. 
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Now, partition 



so that 

(4.6) *V?=(V t + ,<iM)f j 

where v£ +1 = j? r£, r£ = {V p eJ k + v k+p+ J k+p ) and ft = ||r£||. Note that 
(V k + ) T V p ex = 0 and (V k + ) T v k+p+1 = 0 so (V k + ) T v£ +l = 0. Thus (4.6) is a 
legitimate Arnoldi factorization of A. Using this as a starting point it is 
possible to use p additional steps of the Arnoldi recursions (2.3.1) - (2.3.5) to 
return to the original form (4.1). This requires only p evaluations of a matrix- 
vector products involving the matrix A and the p-new Arnoldi vectors. This 
is to be contrasted with the Tchebyshev- Arnoldi method of Saad [28] where 
the entire Arnoldi sequence is restarted. From the standpoint of numerical 
stability this updating scheme has several advantages: 
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(1) Orthogonality can be maintained since the value of k is modest. 

(2) There is no question of spurious eigenvalues. 

(3) There is a fixed storage requirement. 

(4) The deflation techniques associated with the QR-iteration for dealing 
with numerically small subdiagonal elements of H k rnay be taken ad- 
vantage of directly. 

For the sake of clarity, the Arnoldi iteration and the updating procedure 
will be defined: 

Algorithm 4.7 

function [H, V, r] = Arnoldi (A, H,V,r,k,p) 

Input: AV — VH = re{ with V T V = Ik, V T r = 0. 

Output: AV -VH = ref with V T V = I k+P , V T r - 0. 

(1) For j = 1,2, ...,p 

(1) <— ||r||; if 0 < tol then stop; 

ft) H «“ ( pel + -_ x ) ; U V 

(3) w *— Av; 

(4) h^V T w;H 

(5) r <— w — Vh; 

(6) while ||s|| > e||r ||; 

(1) s — V T r; 
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(2) r *— r — Vs; 

(3) h ♦ — h + s ; 

Remark 1: Step (1.6) is Gram Schmidt with iterative refinement to assure 

orthogonality [9]. For details of implementation see Reichel and Gragg [26]. 
Computational experience with this device indicates that it is sufficient to 
do just one step of iterative refinement. 

With the basic Arnoldi factorization defined, it is possible to describe the 
complete iteration: 

Algorithm 4.8 

function [V, H,r] = Amupd ( A,k,p,tol ). 

(1) initialize V(:,l) = Vi; H «— (v\ T Av\); r *— Av i — v\H ; 

(2) [H,V,r\ *— Arnoldi (A, H,V,r,\,k) 

(3) For m = 1,2, .. . 

(1) t/(||r|| < tol) then stop; 

(2) [V, H,r] +- Arnoldi (A, H,V,r,p) 

( 3 ') u = Shifts (H,p) 

(4) Q r- I k+p ; 

(5) for j = 1,2, ...,p 

(1) H *— QjHQj ; (Bulge-Chase corresponding to shift pj = u(j)J 

(2) Q - QQ : : 
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(6) v^(VQ)e„. V~(VQ)( {f J ; 

(7) r *- (vftk + rcrjt) ; where j3 k — el +1 He k , cr k = e k+p Qe k 

Remark 2: The Bulge Chase at step (3.4.1) is defined implicitly as usual 

so that H - ml = QjRj] if the shifts are in complex conjugate pairs then 
the implicit double shift can be implemented to avoid complex arithmetic as 
usual. 

Remark 3: During a Bulge Chase sweep at step (3.4.1), it may happen that 

a sub-diagonal element flj becomes small. The deflation strategies associated 
with the QR algorithm are then employed. In this case, 

( H 3 M \ Hj M 

H = ~ 

^ flj e \ e j T Hj ) 0 Hj 

Thus, an invariant subspace of dimension j has been found. If j ^ k then the 
iteration is halted. Otherwise Hj , V 3 are retained and the iteration proceeds 
with Vj , Hj filling the role of V, H respectively. 

As discussed at the beginning of this section, each application of an im- 
plicit shift Hj will replace the starting vector with ( A - ji 3 I)v Thes after 
completion of each cycle of the loop at Step 2 in Algorithm (4.8): 

Vd = vi <- rp{A)vi ; 

where Tp(\) = \ Numerous choices are possible for the selection 

of these p shifts. Some possibilities will be discussed in Section 5. However, 
there is one immediate possibility to discuss and that is the case of choosing 
p “exact” shifts with respect to H. Thus the selection process might be 
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Algorithm 4.9 


function [u] = Shifts ( H,p ) 

(1) Compute A(/f) (by QR for example) 

(2) Select p unwanted eigenvalues {u(j) *— pj : 1 < j < p} C A(i/) 

Some obvious criterion for this selection might be 

(i) Sort A(i/) according to algebraically largest real part and discard the p 
smallest; 


(ii) Sort A (H) according to largest modulus and discard the p eigenvalues of 
smallest modulus; 

Selecting these exact shifts has interesting consequences in the iteration. 
Lemma 4.10 Let X(H) = {6i, . . . , 0*} U {/ii, . . . , p v } and let 

H+ = Q t HQ 


where Q = Q1Q2 • ■ ■ Q p with Qj implicitly determined by the shift pj. If 
(}j zf. 0 1 < j < k — 1 then /3fc = 0 and 


H + = 


Ht M + \ 

0 *J 


where \(HZ) = A (fi p ) = {p Moreover, 


v}~VQe,='£»i 


where each yj is a “Ritz” vector corresponding to the Ritz value 6 3 i.e. yj = 
Vsj where Hs : = sjdj 1 < j < fc. 
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Proof: Note HI = IH and after applying the p implicit shifts we have 


HQ = QH+ 


so that 

q x = Qe t = xp{H)e 1 , = - ]}(•* ~ N) ■ 

T j = i 

Therefore q x = Ej=i s j(j where Hs J = sfl, since q x = ip(H)e 1 has an- 
nihilated any component of e, along an eigenvector of H associated with 
Pj i 1 < j < p. As a consequence of Theorem (2.8), fik = 0 must hold. 
Moreover, vf = VQe i = V q x = J2j=i V s j(j — VjCj- 1=1 

This lemma provides a very nice interpretation of the iteration when exact 
shifts are chosen. Casting out the unwanted set of eigenvalues using exact 
shifts is mathematically equivalent to restarting the Arnoldi Factorization 
from the beginning after updating ux «— yj(j a linear combination of Ritz 
vectors associated with the “wanted” eigenvalues. Thus the updated starting 
vector has been implicitly replaced by the sum of k approximate eigenvectors. 

If A is symmetric and the p algebraically smallest eigenvalues of H are 
selected for deletion then this method is equivalent to the single vector s-step 
Lanczos process described by Cullum and Donath in [5] and expanded on in 
[6, 8]. This variant has the advantage that a restart of the entire Lanczos 
sequence is not required. Instead, it is updated in place and orthogonality is 
maintained. 
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5 Some Polynomial Filters 


The previous discussion has indicated that it would be advantageous to con- 
struct polynomials ip(\) of degree p which filter out certain portions of the 
spectrum of A. Several researchers have considered such schemes [5,8,28]. 
Related ideas appear throughout the literature of iterative methods for lin- 
ear systems [17,21,30]. 

A particularly appealing polynomial filter may be constructed using Tcheby- 
chev polynomials. In this case, one constructs an ellipse containing the un- 
wanted eigenvalues of H then at step (2.2) of Algorithm 4.9 the shifts p,j are 
taken to be the zeroes of the Tchebyshev polynomial of degree p associated 
with this ellipse (i.e. the polynomial of degree p which gives the best ap- 
proximation to 0 in the max norm). Construction of such an ellipse and the 
associated polynomials is discussed by Saad [29] and is based on Manteuffel s 
scheme[20]. Variants of this are presented and discussed by Chatelin and Ho 
in [2]. 

An alternative is to use exact shifts as described earlier in Section 4. 
When A € R nxn one should take these exact shifts in complex conjugate 
pairs in order to avoid complex arithmetic by using the implicit double shift 
technique. This use of exact shifts is quite effective in the symmetric case 
and may be analyzed in that setting. That analysis will be done in the next 
section. 

One may observe that both filters will have the feature of weighting ex- 
treme eigenvalues most heavily. An alternative is to construct polynomial 
approximations to step functions which take the value zero in unwanted 
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regions and one in wanted regions of the complex plane. One also might 
construct polynomials which produce an updated rq + which is a weighted 
linear combination of approximate eigenvectors corresponding to the wanted 
eigenvalues. 

In order to construct these sorts of filters it is advantageous to be able 
to apply the filter polynomial which is specified by its coefficients when ex- 
panded in the basis of polynomials constructed through the Arnoldi (Lanc- 
zos) process. To make this more precise, suppose ^ is any polynomial of 
degree less than or equal to p. Then expand $ in the form 

p+i 

j=i 

where {pj} are the Arnoldi (Lanczos) polynomials. Observe that 

ip{A)v j = Vy 

where y T = (»?i, »? 2 > •••> »?p+i> 0»0, ...,0) since 
p+i p+i 

Vy = Y v i r H = v i = Pi- i(' 4 ) u i- 

j = i j=i 

Unfortunately, the technique developed in Section 4 for the implicit appli- 
cation of to Vi is not directly applicable because the roots of are 

unknown. However, there is an analogous way to apply this polynomial 
implicitly. Assume that \\y\\ = 1 and construct a vector w a such that 

(5.1) (/ - 2w 0 w 0 t ) ei = y. 

Replace H by 

(5.2) H = (/ - 2 w 0 w 0 t )h(I - 2 w 0 w 0 T ). 
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Now, apply the Householder reduction of H to upper Hessenberg form so 
that 

H +- Q t HQ 

where 

( 5 . 3 ) Q = (/ - 2w 0 w 0 t ') (/ - 2w l w l T> ) ... (/ - 2w k+p _ 2 w k +p-2 T ) 

with each — 2wjWj T ^j being a Householder transformation constructed to 
introduce zeros below the (j + 1) — st element of the j — th column. Now, 
consider the application of Q to the Arnoldi Factorization: 

AVQ - VQ(Q t HQ ) = re k+ /Q 

In order to fit within the updating framework developed in Section 4, the 
condition 

e k+p T Qej = o, 1 <j<k. 

must hold. This is established by the following 

Lemma 5.2 The matrix Q displayed in (5.3) satisfies e k + v T Qtj = 0 , 1 < 
j < k . 

Proof: Let Qj = / - 2w jWj t for 0 < j < k + p - 2, and let = 

QTH^Qj with = H. From (5.1) it follows that w Q = 0(y — e t ), with 

j = ||j/ — ei||. Thus, ei T Q 0 — e, T for i > p+ 1. Since 

Q 0 H^ = HQ 0 

and since H is upper Hessenberg, it follows that 

e, T i/ (1) = t{ T HQ 0 = ti T H 
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for i > p + 2. From this one may conclude that e t T iei = 0 for i > p + 2 and 
thus e, r <3i = e; T for i > p + 2. Now, suppose that e, T Qj = e, T and that 
= ejHfor i> p + j + 1. Since Q } H U+1) = it follows that 

e t T H U + » = e^H^Qj = e t T H 

for i > p + j + 2, and again, one may conclude that e t T w } +\ = 0 so that 
ei T Qj + 1 = e, 7 for i > p + j + 2. This inductive argument continues to hold 
until j = Ar — 1. Hence, 

Now, observe that = tj for k — 1 < i < k p — 2 and for 1 <5 j < k to 

establish the result. D 

This observation allows the application of a polynomial filter when the 
polynomial can be expanded in the Arnoldi basis. It provides an opportunity 
to implement at some interesting options. The idea is to construct a weighted 
linear combination of approximate eigenvectors corresponding to the wanted 
eigenvalues of the matrix A. One may do this by taking a linear combination 
of the eigenvectors of the leading pxp principal submatrix of H corresponding 
to the k eigenvalues of this matrix that are the best approximations to the 
wanted spectrum. It has been assumed that p > k. Let these vectors be yj 
and form 

k 

y = Y, vm 

j=i 

and put y T = (y T 1 0) . Note that 

v, “15lS’ w<T< 
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and the vectors V p yj are the approximate eigenvectors constructed by the 
Arnoldi process from the Krylov subspace of dimension p. In [28] Saad dis- 
cuses some heuristics for choosing the weights 7 j. One possibility is to alter- 
nate the application of this polynomial with the application of a Tchebychev 
polynomial or with the polynomial constructed with exact shifts. Let us 
denote that polynomial by xp then it would be natural to take 

7 j = i/VK 0 ;) 

where the 6 : are the approximate eigenvalues in the wanted spectrum. The 
rational for such a choice is that after these two successive steps 

u, ++ = ij>(A)$(A)vi 

= 13 H A )M A Wpyj 

i - 1 

= 13 + 'l>{A)$(A)z 

J = 1 

= 13 9i + *l>{A)^{A)z 
i= 1 

Where { qj } is the set of normalized eigenvectors corresponding to the eigen- 
values of A that are approximated by 9j and the vector 0 is orthogonal to 
{£,}. Since z will nearly belong to the subspace spanned by the eigenvec- 
tors corresponding to eigenvalues belonging to a region in the complex plane 
where the polynomial t/>(A) 0 (A) is small, this choice will have the desired 
effect. Namely, the starting vector is forced to be the sum of k eigenvectors 
of A. 
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6 The Symmetric Case 


The symmetric case is important and therefore, it is worthwhile to analyze 
the k-step method in this setting. Throughout this section it will be assumed 
that 

A - A t . 

The Arnold! Factorization then reduces to its predecessor the Lanczos fac- 
torization 

AV — VT = rel 

where T is a symmetric tridiagonal matrix. In order to analyze the iteration 
induced by Algorithm (4.8) when exact shifts are taken some notation and a 
preliminary lemma must be established. 


Lemma 6.1 Let 


M = 




\ ej 


be a symmetric tridiagonal matrix. Then the roots of the equation 

1 


fel(T-\I)-'e k = 




are eigenvalues of M . 


Proof: Define M\ = M - XI , T\ = T - XI and T\ = T - A7. Then for any 
A £ A (T) U A (t) 

( h 0 \ / 7\ /?*ejtef \ 

X = { fte.eJV /„ J ( 0 7i - ) ' 
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Thus 


detM\ = detT\det{T\ — e^ejej ) 

= detT x detf x det(I - d'‘e[Tx'cte,e]f-'j 
= detTxdetTxl 1 - 

Since A(T) U A(T) is a discrete set, the expression developed for detM\ is 
seen to be valid in general by a continuity argument. □ 

With this lemma established it will be possible to analyze the iterations 
using polynomial filters constructed from exact shifts. The selection rule to 
be analyzed is to retain the k extreme eigenvalues of Tfc +p (i.e. an equal 
number of the smallest and largest eigenvalues of 7* +p ). Let m denote the 
iteration number. Then is the starting vector, and 

_ v< m) T eT 
™ *k+p V k+p 1 k+p ~ r k+p e k+p • 

Let 

/ rp(rn) 

rp( m ) _ 1 -» k Pk e k e 1 

fc+r ^ 0l m) ei el f( m > 

have eigenvalues 

{^l,m+l ^ ^2,m + l < ' ' • < @t,m + 1 < Pl,m+l < ’ ' ' 

^ fkp,m+\ $<+l,m+l ^ ‘ ^ @k,m+ 1} 

and let T have eigenvalues 

{^lm ^ @2 m ^ ^ ^ ^#+lm ^ ^ @km } 
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and let ef(f^ - A/) _1 ei have zeroes {Aim < • • • < Ap-i,m}- Then 


g(m)T T (rn) g(m) = 


ri(m + l) 


0 


o 0i m+1 ) 


4 

0 


where Q = <?i m Q 2 m • ■ • <2 pm are the orthogonal matrices constructed to 
apply the implicit shifts pi m , . . . , p pm at step 2.4 of Algorithm 4.8. And step 
(2.5) gives 

y(m+l) _ [pdm) g(m)j 

From Lemma (6.1), 

{ @j,m + 1 } U { } 

are the k + p roots of the equation 

(6.2) (4”' ) W(d" ) - A/)-'e* = 


- A/)-'e, ’ 

Lemma 6.3 Each {0j m }, m = 1,2,... is a decreasing sequence for 1 < j < 
and an increasing sequence for £ -f 1 < j < k. Moreover, 0( m < /ii( m +i) 
and P p ,(m+i) < ^+i,m for all m sufficiently large. 

Proof: It will first be shown that 9 tm < Aim for all m sufficiently large. To 
see this, suppose that 

Aim ^ ^lra • 

Define 

^ = - \I)~W 

Note that the zeroes {Aim < • • • < A P -i,m} of the function ef (T^ — A/) -1 ei 
are poles of <f. One can check that (^ m) ) 2 e^(T jk (m) - A 7) _1 e fc is an in- 
creasing positive continuous function on the interval (— oo,0i m ) and that 
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4> is unbounded above as A — ► — oo and that — r ) -4 — °° while 

</»(Airn + t) -» +oo as r — » 0. From these facts, it follows that the root 
0i,m+i < Aim and also 0 2 , m +i < Oim due to Lemma (6.1) and the rational 
structure of equation (6.2). Therefore, 

01,m + l < Aim < 02,m+l — ^lm 

The interlace theorem given by Golub and VanLoan in [I6,p479] assures 
the existence of an eigenvalue A of A between 0i >m +i and 0 2>m +i and hence 
X jm must also lie between 0 1>m+1 and 0i, m . However, this situation can occur 
at most n times since each occurrence isolates a distinct eigenvalue of A. The 
same argument may be applied in succession for 0j m l < j < l to see that 
Aim < 9 jm at most n-times. A similar argument will show 9j m < Apm at most 
n times. For m sufficiently large this will imply that 

d\m < * ' ‘ < 9(m < Aim < ' ' • < Ap-l,m < 9(+l,m < ■ ‘ ‘ < 9 km 

It follows readily from Lemma (6.1) and the rational structure of equation 
(6.2) that there is exactly one zero of the equation in each of the intervals 

( — OO,0i m ], (01m, 02m], ‘ ' 1 ($(/- l)m , 0/m], [0f+l,m , 0(/+2)m ), ' ’ , [0(Ar-l)m , 9km ) , [9 km, °°)- 

Moreover, since {@[ m) ) 2 el{T { k m) - A I)~ l e k is a strictly increasing continuous 
function on the interval (0i m ,0|+i, m ) which tends to -oo at the left endpoint 
and +oo at the right endpoint of the interval and since <^(A) alternates sign 
on crossing each pole Ajm, if follows that there are p zeros of equation (6.2) 
in the interval (0/ m ,0/+i,m) and hence 

{ A^j(m+l) } G (0/m, 0/+l,m)" 
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persists for all m sufficiently large. 

The following theorem results directly from Lemma 1.2. 

Lemma 6.4 { B jm } is decreasing and lim m -oo 0 pm = 0 3 1 < j < t while 

{0 jm } is increasing and lim m _oo 6j m = B 3 t + 1 < j < k . 

Proof : The proof of Lemma ( 6 . 2 ) implies that 0 J>m+1 < B jm . for j < l while 
0jm+i — f° r j > I for a U m sufficiently large and Aj < 9j m < A n for all j 
and all m. Since bounded monotone sequences converge the result is proved. 
□ 

The convergence of the sequences { Bjm } has been established but it is still 
not clear that the limit points will be eigenvalues of the matrix A. This shall 
be established by showing that the sequence {/?[ m) } is not bounded away 
from zero. 

Theorem 6.5 The limit points {%} of the sequences {0 jm } are eigenvalues 
of the matrix A. 

Proof: The sequence of vectors lie on the unit ball in R n and 

hence have a convergent subsequence Let Vi be the limit of this 

subsequence. It is sufficient to show that the corresponding subsequence 
converges to zero. Let e be a specified acceptance tolerance in the 
sense that the iteration is halted, or deflation occurs when a subdiagonal 
of H falls below e. Then, without loss of generality, it may be assumed 
that > e for all j ^ k (otherwise a deflation would have occurred at 

indices j < k or the iteration w T ould halt if < e for j > k ). Suppose that 

4 m,) > € for all i. It follows from the implicit-Q Theorem that the sequence of 
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matrices {T { k m,) } , { f (m,) } and the sequence of vectors {rj: m,) } must converge 
to limits T k , T and r k respectively. Moreover, -* fa = IK II- Now, the 
subsequences {0j( m ,+i)} each must converge to a root of the equation 

But, at the same time ^(m.+i) @j which leads to a contradiction. The 
contradiction arises since {#j} = A(T*.) and hence no 9j can be a root of the 
equation above. This is assured since Pki 1 0 and the last component of each 
eigenvector of T k as well as the first component of each eigenvector of T must 
be nonzero due to the nonzero off diagonal elements of Tk, T. It follows that 

Pi m ' ) - 0 . 

The assumption that > e for j < k and the implicit- Q Theorem 

imply that —»/?* = ||r fc || and since this nonnegative subsequence is not 

bounded away from zero it follows that ||r[ m,) || -> IKII = fi k = 0. From this 
one may conclude that must be eigenvalues of A. D 

These results which are based upon compactness arguments are not very 
satisfying. They do not reveal much about the behavior of of the iteration. 

A better understanding of the nature of the convergence is found in the fol- 
lowing results. The following discussion essentially shows that the expansion 
coefficients 7 = qjv[ m ^ must converge to zero for \ 3 £ (0/,$/+i) . 

Define: *„(A) = IK, *(A). Then v? = 

Lemma 6.6 Assume £> 2, £+1 < k—l. Then (i) |’h m (A,.)| > ( a" _i "h 0 l'^m(A n -i)l 

(iij |*„<A,)| > + l)"” l*«(Aj)| 
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Proof: Assume m is sufficiently large that C {9t,6e+i) holds. Consider 

the normalized polynomial 


«m(A) = 


*»( A) 


-n n 




«m( An-l) 


Note, that 0* < A„_j will imply that 

A A A n _i ^ ^ A A n-1 ^ ^ 

A„_i n jt A n _j fiji A n _j Of 

for A > A n _i and (i) follows. A similar argument using will establish 

(ii). D 

Let us now define 7 j m) = qjv[ m * where {q 3 } are the orthonormal eigen- 
vectors for A. Then the following lemma may be established. 


Lemma 6.7 Suppose that neither of vfqi or vjq n are equal to zero. Then 
_► o for every j such that A n _j > A j > 0 t+1 and for every j such that 
A2 < Aj < Of- 


Proof: Observe that 

(m) ?f'i l m(^)'>i 11 

(rtuxH- 1 ' 1 )* 

Hence 

_(1)2 , 
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2 


< 


< 



[®»(A,)1 

Uv 

L^m(A„)J 


m' 

®-n(A,) 1 

U (l) / 

\7n / 

2 

1 



-» 0 


for all j such that 




is bounded. Now, \P m (A) is monic of degree mp 


|*m(An-l)l 

and has all roots contained in the interval 1). Therefore, 


(I'M*)! / |* m (A_,)|) < 1 

for all A £ ( 0 ; +1 , A n _j],and 

(|*m(A)| / |#»(A a )|) < 1 

for all A € [62 i@t) and the result follows. O 

We are now able to prove the main result. 

Theorem 6.8 Suppose that neither ofvjqx orvJq n are equal to zero. Then 
61 = Ai and 9 k = \ n . 

Proof: Since the sequence is bounded it must have a limit point Vi. 

From our previous result, we know that 

qjv 1 = 0 , Aj £ {6(+\ , A„_i] and A j £ [A 2 , 0 <) • 

Moreover, p ( ™\ A) -> p(A) = n*=i(A - 0 j) and each V * m) satisfies 

v[ m)T p[ m) \A)v\ m) < v[ m)T p\A)v i m) 
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for all monic polynomials p of degree k. Since, 


t>i" ,r pr ), (-4)»!" ) -> -S”’ 1 phaA’"' 

it follows that 

vJp 2 (A)vi < vlp 2 (A)v i 
for all monic polynomials p of degree k. But 

vJp 2 {A)v\ = + 7iP 2 (^i) + 7 2 p 2 (^n) 

where = qjv\, ^2 7 2 = 1. This leads to a contradiction, since it 
to construct another monic polynomial p such that 

ufp(A) 2 Ui = 53 p(^j) 2 7j + 7lP(^l) 2 + 7nP 2 (^n) 

< vJp(A) 2 vl 

as the following Lemma shows. 

Lemma 6,9 Suppose k is even, k > 4, with i s.t. p( A) < 0 on 
Then there is a quadratic polynomial (f> such that 

p(A) = p(A) - <f>(\) 

is monic and satisfies both 

0 < p(Ai) < p(Aj) , 0 < p(A n ) < p(A n ) 

and 

0 > p(A) > p(A) , A 6 {Qe,9 e+ i) . 


is possible 


□ 

(d e , 0(+i). 
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Proof: Take <^>(A) = e(A — 0*)(A — ^+ 1 ), e > 0. Consider the polynomial p. 
Note that p is monic since deg<^> < degp. Moreover, for e sufficiently small 

0 < p(A) = p(A) - <A(A) < p(A) 

since <j>(\) = p(A) -p(A) has at most two roots 0 t ,0e+ 1 , it cannot change sign 
on (0*,0 m-i). 

Now, it is clear for e > 0 sufficiently small 

0 < p( A,) = p(Aj) - e(Aj - 0,)(Ai - 0 e+l ) < p(X l ) 


and 

o > p(A n ) = p(An) - e(A n - 0 t )(X n - 0 e+l ) > p(A„) 

since <}> < 0 for A € {0t,6e+ 1 ). D 

These results indicate that the sort of convergence that takes place will 
typically be slow linear convergence of ||r[ m ^|| to zero with a rate governed by 
the ratio where Aj is the eigenvalue of A in (fit, ). This 

may be seen through an analysis similar to the proof of Lemma(6.7). While 
this iteration does seem to perform reasonably well in practice if one monitors 
the Ritz estimates and halts when these are small for a significant percentage 
of the k eigenvalues actually sought. However, other iterations such as the 
one developed in Section 5, might do a better job of evenly distributing the 
components of the vector ui along the k wanted eigenvectors. 
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7 The Generalized Eigenvalue Problem 


In this section the generalized eigenvalue problem will briefly be discussed. 
The generalized problem is to find (x, A) such that 

Ax = A Mx . 

Most often, the matrix M arises from applying a Galerkin principle to a 
linear operator C which leads to 

A = ( Cfc , <f>j) , M = (fa, fa) 

where { ,) is an inner product on a linear space H and and 6 W n 

is a basis for a finite element subspace W n C 7~L ■ The matrix M is thus 
symmetric and positive definite. This condition shall be assumed in this 
section. The basic iterative method will carry over to this setting with very 
little modification. It will be necessary to set aside storage for two basis 
matrices V and W and to maintain and update a factorization of the form 

(7.1) AV-WH = re T k 
where 

W = MV , V T W = I , and V T r = 0 . 

Again a simple recursion is available to advance the factorization (7.1) one 
step. Just note that 

I H h 

(7.2) A(V, v) - (W 7 ,n;) 

\ (3eJ a 
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leads to 

AV - WH - 0we T k = 0 

SO 

(r — 0w)ej = 0 . 

But w = Mv , so by solving 


(7.3) Mv = r and putting 0 = (v T r)* 

one may scale by 0 to get 


1 „ 1 
V 0 V ' W 0 

Since V T r = 0, and v T w = 1, it follows that 

V?W + = I k+ 1 , 


where V + = (V,u) and W + = (W,w). Then one has 

t 


h 

a 


V T ^ 


\ vT ) 


Av 


follows from equating the ( k+ 1 )-st column of (7.2) and the updated residual 


r + = Av — (V, v) 

Again this step may be accomplished through two matrix vector products 
followed by two more to accomplish one step of iterative refinement to ensure 
the biorthogonality of V, W. 

There are two key consequences of this arrangement: 
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1. Q t V t WQ = I is preserved so the implicit Q-R shift strategy may be 
applied. 

2. If A = A T is symmetric, then 

H = V t AV 

follows from V T W = /, V T r — 0 so that H = H T will be symmetric 
and tridiagonal when A is symmetric. 

With these observations, it is straightforward to adapt the algorithms 
previously discussed to solve the generalized eigenproblem. Some limited 
computational experience with this approach is the subject of the following 
section. 


8 Computational Results and Conclusions 

Computational results for this technique are quite promising but are certainly 
preliminary. There is a Fortran implementation of the algorithms developed 
here. Two versions of the code have been produced. One of these implements 
the strategy for the generalized symmetric eigenvalue problem as described 
in Section 7. The other implements the algorithm for the standard non- 
symmetric eigenproblem. In addition to exhibiting behavior on some test 
problems, two experiences with applications will be discussed. Finally, some 
very interesting illustrations of the shapes of the filter polynomials that are 
constructed through exact shifts shall be reported. 

There are some important details of the Fortran implementation of Al- 
gorithm (4.7). Step 3 requires a user supplied matrix vector product. Steps 
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4 and 5 are implemented through calls to the level 2 BLAS [11,12] routine 
DGEMV. One step of iterative refinement is carried out at Step 6 of Algo- 
rithm (4.7) rather than iterating until the test ||s|| < e||r|| is passed. Steps 
(6.1) and (6.2) were also implemented through calls to DGEMV. In all of 
the computations observed there was never a loss of orthogonality in the 
columns of V. In all cases \\V T V - /|| was on the order of unit roundoff 
error. Eigenvalue calculations used a slight modification of EISPACK [32] 
subroutines TQL in the symmetric case and HQR in the nonsymmetric case. 
These may be replaced by the corresponding block routines from LAPACK 
[10] to enhance performance in the future. 

Expressing the algorithm in terms of the level 2 BLAS has provided the 
means to achieve high performance portable Fortran code. The code has been 
run on SUN SPARC, CONVEX Cl, Stardent Titan, CRAY 2, and CRAY 
YMP computers. The cost of operations were clearly dominated by the 
user supplied matrix vector products (and system solves in the generalized 
problem). The time spent in the user supplied portion was orders of mag- 
nitude over the time spent in the other parts of the eigenvalue calculations. 
This performance characteristic is a direct consequence of the performance 
of DGEMV on the architectures of the machines listed above. The crucial 
point for improving the algorithm is to better understand the construction 
of the filter polynomials in order to reduce the required number of user sup- 
plied matrix vector products . Parallelism may be invoked through the level 
2 BLAS and also through the user supplied matrix vector product. 

In all of the results reported below, exact shifts were used as described 
in Algorithm (4.10). The iteration was halted when || (e^ t/j)r*|| < 10 -7 , 1 < 
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j < k — 3 where y : is the j — th Ritz vector corresponding to Ritz values 
approximating the wanted spectrum. This ad hoc stopping rule allowed 
the iteration to halt quite early in cases where it was difficult to make a 
clean separation between the wanted and unwanted spectrum. This ad hoc 
criterion will have to be replaced with a more rigorous one in the future. 

In the first set of test problems the matrix A arises from a standard 5- 
point discretization of the convection-diffusion operator on the unit square 
fi. The PDE is 

— Au + pu x = Xu, in ft, u|an = 0 

When p = 0 the matrix A is the discrete Laplacian and for p > 0 A has 
distinct complex eigenvalues which appear in a rectangular grid in the com- 
plex plane when the cell size h — l/(n + 1) is large enough with respect 
to the parameter p. However, the boundary conditions of the continuous 
problem do not admit eigenfunctions corresponding to complex eigenvalues, 
so the eigenvalues of the matrix A become real when the mesh size becomes 
small enough. The order of the discrete operator A is N = n 2 and since it’s 
eigenvalues are distinct, it is diagonalizable. These problems allowed testing 
of the algorithm for accuracy and performance in some interesting but well 
understood cases. In both of the tables below, the values k = 10 and p = 10 
were used. The two columns on the right of the tables give the norm of 
the residual vector r and the norm of the true residual ||j4a: — xA|| for the 
sixth eigenvalue. Typically, the eigenvalues of smaller index had residuals 
that were smaller than this one. For the symmetric problems the residual 
estimates were uniformly small for the eight smallest eigenvalues. 
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Table 8.1 


Discrete Laplacian 


Dimension 

Niters 

INI 

||Ax — xA|| 

100 

12 

1.4-06 

3D-15 

256 

23 

3.4-06 

5D-15 

400 

29 

6.5-06 

5D-15 

625 

25 

7.1-06 

3D-14 

900 

29 

6.2-06 

2D-14 

1600 

43 

2.9-06 

6D-14 

2500 

50 

1.1-05 

9D-13 

3600 

63 

9.9-06 

4D-11 

4900 

92 

8.9-06 

ID-11 

8100 

237 

1.1-05 

ID-11 

10000 

165 

1.1-05 

8D-12 


In Table 8.2 below, the problems of order 256 and 400 did not satisfy the 
convergence test before the maximum number of iterations allowed had been 
reached. In all cases the ten eigenvalues of smallest real part were sought. 
In both of the problems just mentioned, five or more eigenvalues had been 
determined to high accuracy. In all cases the iterations could have halted 
much earlier if a better stopping criterion were devised. 
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Table 8.2 

Convection Diffusion 


Dimension 

Niters 

IMI 

||j4x — xA|| 

100 

61 

5.3-06 

ID-12 

256 

100 

.23 

ID-5 

400 

100 

5.2-03 

2D-10 

625 

77 

2.3-06 

8D-12 

900 

153 

8.9-06 

2D- 14 

1600 

103 

7.4-06 

6D-14 


The second set of results will briefly describe two problems that arise in 
the context of solving partial differential equations. The first of these involves 
a discretization of a membrane problem in which the membrane is composed 
of two materials. On an open bounded connected set Q C R 2 we consider 

— Au = A pu, in 0, u|an = 0 
where the density p is of the form 

p = axs + £(1 - Xs) 

where is the characteristic function of a subset S (Z Cl with area 7 . The 
problem is to determine the density function p which minimizes the lowest 
eigenvalue A^p) of this PDE. Here a and /? are the known (constant) densities 
of two given materials in respective volume fractions 7 /| 0 | and 1 ~ll\Cl\ and 
the set S is occupied by the material with density a. Cox [4] has formulated 
an algorithm to solve this minimization problem. The algorithm generates a 
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sequence of symmetric generalized eigenvalue problems 

Av = A M(p)v 

which arise through a bi-linear finite element discretization of the PDE. The 
density function p is modified at each iteration with the set S determined 
through level sets of the corresponding eigenfunction. The matrix A is pos- 
itive definite and independent of the density function p so the problem was 
cast in the form 

M(p)v = — Av. 

A 

Since only matrix vector products are required of M the dependence on p 
presented no additional computational burden. The matrix A was factored 
once and this factorization was subsequently used repeatedly to solve (7.3) 
(with A playing the role of M in that equation). The eigenvalue iteration also 
benefited from the re-use of the converged starting vector from the previous 
problem but this did not appear to be of great consequence in this case. 
The following table gives results for the same sub-problem on a variety of 
machines. 


Table 8.3 


Membrane Problem on Various Machines 



Sun 

Convex 

Titan 

Y-MP 

Time (secs) 

240 

81 

40.9 

5.4 

matrix vector 

40 

40 

40 

40 

II V T W - 7|| 

io - 14 

io - 14 

10" 14 

IO" 11 
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The overall performance was excellent on this problem. Grid sizes of of 
64 by 64, 100 by 100, and 200 by 200 were used. Both minimization of A^p) 
and \ 2 (p) were done. The number of matrix vector products was typically 
around 32-40 regardless of the dimension of the matrix. That is, with k — 8 
and p = 8 the eigenvalue solver required 3 to 4 iterations with 3 being the 
usual number. The Ritz estimates for \\Ax — A/(p)xA|| were on the order of 
10 D — 14 for the lowest six eigenvalues. 

The second application leads to a nonsymmetric eigenvalue problem. The 
PDE arises in a study of bifurcations in a Couette-Taylor wavy vortex in- 
stability calculation. This work described in [13] is based upon a method of 
W.S. Edwards and L.S Tuckerman which is designed to study these bifurca- 
tions from Taylor vortices to wavy vortices. The discrete problem is obtained 
by first linearizing the Navier-Stokes equations about a (numerically) known 
steady state solution U corresponding to Taylor vortices. The perturbation 
u corresponding to wavy vortices is found by solving the linearized Navier- 
Stokes problem 

i 

— = -( U • V)u - (u • V)C/ - Vp + t/V 2 u 

dt 


with 


V ■ u = Oand ujao = 0 


where Q is the annular region between two concentric rotating cylinders. 
This PDE is discretized to then yield a nonsymmetric eigenvalue problem 


A(u)v — \v 

Since a pseudo-spectral method is used, the discrete matrix is dense rather 
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than sparse. However, matrix vector products can still be performed rapidly 
using Fourier transforms. The discrete problem involved a matrix of order 
2380 . The eigenvalue code with k = 16 and p = 40 required 60 iterations 
to produce eight eigenvalues and corresponding eigenvectors with largest real 
part. This entailed about 2400 matrix vector products. The accuracy of these 
were confirmed to be at least five significant digits. Edwards in a private 
communication remarked that in his opinion ”the high accuracy could not 
have been achieved by other methods I might have tried.” 

This behavior of the algorithm on these two problems seems to be typical 
on more difficult problems. The number of matrix vector products tends to 
be near n for difficult nonsymmetric problems. Symmetric generalized eigen- 
value problems from finite element analysis of structures or membranes seem 
to be solved very rapidly if posed in terms of finding the largest eigenvalues. 

To close this section, the interesting behavior of filtering polynomials 
associated with the choice of exact shifts will be presented. Two problems 
will be discussed. The first example arises from the convection diffusion 
above with p = 40 . The grid size was 1/30 leading to a nonsymmetric 
matrix of order 900 . The results for this problem are displayed in Figures 
8.1 and 8.2. The second example is the banded Toeplitz matrix used for 
test purposes by Grcar [17]. This matrix is non-normal and has a nontrivial 
pseudo-spectrum as discussed in [21]. (The c pseudo-spectrum of a matrix A 
is {A € C : || (A7 - >4) -1 || > e -1 } ). The matrix is a 5-diagonal matrix with 
the value -1 on the first sub-diagonal and the value 1 on the main diagonal 
and the next three super diagonals. The results for this problem are displayed 
in Figures 8.3 and 8.4. 
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The graphs shown below depict the filter polynomial for values of A 
over a region containing the eigenvalues of A. The surface plot is of \ip\ and 
the contour plots are of log(\ip\) The + symbols show the location of the true 
eigenvalues of A The o symbols mark the location of the eigenvalues of H 
that are ” wanted”. These will eventually converge to eigenvalues of A. The 
* symbols show the roots of the polynomial 0. 

Figure 8.1 

Convection Diffusion: iteration 1 
Figure 8.2 

Convection Diffusion: at convergence 

In Figures 8.1 and 8.2 the values k = 10, p = 10 were used. One may 
observe convergence by looking at the 10 leftmost o symbols enclosing the + 
symbols. The interesting features of these filter polynomials is that they are 
remarkably well behaved in terms of being very flat in the region that is to 
be damped and very steep outside that region. The reason for this desirable 
behavior is not understood at the moment. 

Figure 8.3 Grcar matrix: iteration 1 

Figure 8.3 

Grcar matrix : iteration 1 
Figure 8.4 

Grcar matrix : at convergence 
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In Figures 8.3 and 8.4 the corresponding behavior of the filter polynomials 
is shown. In these figures only the upper half-plane is shown. The dotted 
line shows the boundary of the practical spectrum [21] for this matrix. It 
is interesting to note how the contours of the filter polynomial obtained 
through the exact shifts mimic the shape of this boundary. The algorithm 
claimed convergence of the leftmost eigenvalues (ie. the ten eigenvalues of 
smallest real part). However, as demonstrated in the figure, these are pseudo- 
eigenvalues. Interestingly enough, HQR from Eispack will give the same 
behavior if applied to the transpose of the Grcar matrix. HQR will give the 
correct eigenvalues when applied to the Grcar matrix directly and it was used 
to calculate the values of the "true” spectrum shown above. 

In conclusion, it seems that this is quite a promising approach. A direct 
relationship to the implicitly shifted QR iteration has been established and 
several problems inherent to the traditional Arnoldi method have been ad- 
dressed through this new approach. The most important of these are the fixed 
storage, maintenance of orthogonality, and avoidance of spurious eigenvalues. 
The computational results are clearly preliminary. The limited experience 
indicates research is needed in constructing filter polynomials which have bet- 
ter properties with respect to the wanted part of the spectrum. Moreover, 
a better understanding of the Ritz convergence estimates in the nonsym- 
metric case would be helpful. These estimates have been very important in 
terminating the iteration early (ie. before the residual is very small) in the 
symmetric (generalized) eigenproblem, A criterion for choosing the values 
of k and p is also required. At present, ad hoc choices are made and there 
is little understanding of the relation of these two parameters to each other 


49 



and to the given problem. They have been chosen through experimentation 
for these results. 

Future research on this topic might include a blocked variant to better 
deal with multiple eigenvalues. Investigations of the use of a preconditioner 
would also be interesting. Finally, extensions of this idea to other settings 
such as the solution of linear systems would seem to be a promising area of 
research as well. These investigations are underway and will be the topic of 
subsequent papers. 


Acknowledgements 

I would like to acknowledge Dr. Phuong Vu and Cray Research for providing 
access to CRAY-2 and CRAY Y-MP computers and for help in performing a 
number of numerical experiments with the computer codes. I would also like 
to thank Dr. Lothar Reichel for reading the manuscript and making some 
useful comments and corrections. 


References 

1. W.E. Arnoldi, The principle of minimized iterations in the solution of 
the matrix eigenvalue problem, Quart. Appl. Math. 9, 17-29,(1951) . 

2. F. Chatelin and D. Ho, Arnoldi-Tchebychev procedure for large scale 
nonsymmetric matrices, Math. Modeling and Num. Analysis, 24,53- 
65,(1990). 


50 


3. A.T. Chronopoulos, s-Step Orthomin and GMRES implemented on 
parallel computers, Dept. Computer Science Report TR 90- 15, University 
of Minnesota, Minneapolis, Minn. ,(1989). 

4. S. Cox, The symmetry, regularity, and computation of a membrane 
interface, Dept. Math. Sci. Rept., Rice University, Houston, TX, 
(1990). 

5. J. Cullum and W.E. Donath, A block Lanczos algorithm for computing 
the q algebraically largest eigenvalues and a corresponding eigenspace 
for large, sparse symmetric matrices, in Proc. 1974 IEEE Conference 
on Decision and Control, IEEE Press, New York, 505-509, (1974). 

6. J. Cullum, The simultaneous computation of a few of the algebraically 
largest and smallest eigenvalues of a large, symmetric, sparse matrix, 
BIT 18, 265-275, (1978). 

7. J. Cullum and R.A. Wiloughby, Computing eigenvalues of very large 
symmetric matrices - an implementation of a Lanczos algorithm with 
no reorthogonalization , J. Comput. Phys. 434, 329-358, (1981). 

8. J. Cullum and R.A. Wiloughby, Lanczos Algorithms for Large Sym- 
metric Eigenvalue Computations, Vol. I Theory, Birkhauser Boston, 
Inc., (1985). 

9. J. Daniel, W.B. Gragg, L. Kaufman, G.W. Stewart, Reorthogonaliza- 
tion and stable algorithms for updating the Gram-Schmidt QR factor- 
ization, Math. Comp. 30, 772-795,(1976). 


51 



10. J.W. Demmel, J.J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammar- 
ling, and D. Sorensen, A prospectus for the development of a linear al- 
gebra library for high performance computers, Mathematics and Com- 
puter Science Division Rept. ANL-MCS-TM-97, Argonne National 
Laboratory, (1987). 

11. J.J. Dongarra, J. Du Croz, S. Hammarling, and R.J. Hanson, An ex- 
tended set of fortran basic linear algebra subprograms, ACM Trans. 
Math. Soft. 14, 1-17, (1988). 

12. J.J. Dongarra, J. Du Croz, S. Hammarling, and R.J. Hanson, Algorithm 
656 An extended set of fortran basic linear algebra subprograms: Model 
implementation and test programs, ACM Trans. Math. Soft. 14, 18- 
32, (1988). 

13. W.S. Edwards, S.R. Beane, S. Varma, Onset of Wavy Vortices in the 
Finite-Length Couette-Taylor Problem, submitted to Physics of Flu- 
ids, (1990) 

14. J.G.F. Francis, The QR transformation: A unitary analogue to the LR 
transformation, Parts I and II, Comp. J. 4, 265-272, 332-345, (1961). 

15. G.H. Golub, R. Underwood, and J.H. Wilkinson, The Lanczos algo- 
rithm for the symmetric Ax = XBx problem, Report STAN-CS-72- 
270, Department of Computer Science, Stanford U. Stanford, California 
,(1972). 


52 



16. G.H. Golub and C.F. Van Loan, Matrix Computations, The Johns 
Hopkins University Press, Baltimore, Maryland (1983). 

17. J.F. Grcar, Operator coefficient methods for linear equations, Sandia 
National Lab. Rept. SAND89-8691, Livermore, California,(1989). 

18. W. Karush, An iterative method for finding characteristic vectors of a 
symmetric matrix, Pacific J. Math. 1, 233-248, (1951). 

19. C. Lanczos, An iteration method for the solution of the eigenvalue 
problem of linear differential and integral operators, J. Res. Nat. Bur. 
Stand. , 45, 255-282, (1950). 

20. T.A. Manteuffel, Adaptive procedure for estimating parameters for the 
nonsymmetric Tchebychev iteration, Numer. Math. 31,183-208,(1978). 

21. N. Nachtigal, L. Reichel, L.N. Trefethen, A hybrid GMRES algorithm 
for nonsymmetric matrix iterations, Numerical Analysis Rept. 90-7, 
Dept. Mathematics, MIT, Cambridge, Mass., (1990). 

22. C.C. Paige, The Computation of Eigenvalues and Eigenvectors of Very 
Large Sparse Matrices, Ph.D. thesis, Univ. of London, (1971). 

23. C.C. Paige, Computational variants of the Lanczos method for the 
eigenproblem, J. Inst. Math. Appl. 10, 373-381, (1972). 

24. B.N. Parlett and D. S. Scott, The Lanczos algorithm with selective 
orthogonalization, Math. Comp. 33, 311-328, (1979). 


53 



25. B.N. Parlett, The Symmetric Eigenvalue Problem, Prentice-Hall, En- 
glewood Cliffs, NJ. (1980). 

26. L. Reichel and W.B. Gragg, Fortran subroutines for updating the QR 
Decomposition of a matrix, ACM-TOMS, (to appear). 

27. A. Ruhe, Rational Krylov sequence methods for eigenvalue computa- 
tion, Linear Algebra Apps. , 58, 391-405, (1984). 

28. Y. Saad, Chebyshev acceleration techniques for solving nonsymmetric 
eigenvalue problems, Math. Comp., 42, 567-588, (1984). 

29. Y. Saad, Projection methods for solving large sparse eigenvalue prob- 
lems, in Matrix Pencil Proceedings ( B. Kagstrom , and A. Ruhe, 
eds), Springer- Verlag, Berlin, 121-144 (1982). 

30. Y. Saad and M. Schultz, GMRES: A generalized minimal residual algo- 
rithm for solving nonsymmetric linear systems, SIAM J. Scientific and 
Stat. Comp. 7, 856-869, (1986). 

31. D. Scott, Analysis of the symmetric Lanczos algorithm, Ph.D. disserta- 
tion, Dept, of Mathematics, University of California, Berkeley, (1978). 

32. B.T. Smith, J.M. Boyle, J.J. Dongarra, B.S. Garbow, Y. Ikebe, V.C. 
Klema, and C.B. Moler, Matrix Eigensystem Routines - EISPACK 
Guide, Lecture Notes in Computer Science, Vol. 6, 2nd edition, Springer- 
Verlag, Berlin, 1976. 

33. G.W. Stewart, Introduction to Matrix Computations, Academic Press, 
New York, 1973. 


54 



34. H.F. Walker, Implementation of the GMRES method using House- 
holder transformations, SIAM J. Scientific and Stat. Comp. 9, 152- 
163,(1988). 

35. J.H. Wilkinson, The Algebraic Eigenvalue Problem, Claredon Press, 
Oxford, England (1965). 



Implicit Application of Polynomial Filters 
in a Jfc-Step Arnoldi Method 


D. C. Sorensen 


RIACS Technical Report 90.43 
October 1990 





